AITopics | handling missing data

Handling Missing Data with Graph Representation Learning

Neural Information Processing SystemsDec-24-2025, 18:13:33 GMT

Machine learning with missing data has been approached in many different ways, including feature imputation where missing feature values are estimated based on observed values and label prediction where downstream labels are learned directly from incomplete data. However, existing imputation models tend to have strong prior assumptions and cannot learn from downstream tasks, while models targeting label predictions often involve heuristics and can encounter scalability issues. Here we propose GRAPE, a framework for feature imputation as well as label prediction. GRAPE tackles the missing data problem using graph representation, where the observations and features are viewed as two types of nodes in a bipartite graph, and the observed feature values as edges. Under the GRAPE framework, the feature imputation is formulated as an edge-level prediction task and the label prediction as a node-level prediction task. These tasks are then solved with Graph Neural Networks. Experimental results on nine benchmark datasets show that GRAPE yields 20% lower mean absolute error for imputation tasks and 10% lower for label prediction tasks, compared with existing state-of-the-art methods.

graph representation learning, handling missing data, name change, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

Appendix for " Handling Missing Data with Graph Representation Learning "

Neural Information Processing SystemsAug-22-2025, 00:57:49 GMT

For GAIN, we use the source code released by the authors. Here we report the running clock time for feature imputation of different methods at test time. We adapt the same setting as in Section 4.1 and the results are shown in Appendix C. G The Douban dataset has 3000 observations and 3000 features. The Y ahooMusic dataset has 1357 observations and 1363 features. Inductive matrix completion based on graph neural networks.

dataset, graph representation learning, handling missing data, (9 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.06)
North America > Canada (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.37)

Add feedback

Handling Missing Data with Graph Representation Learning

Neural Information Processing SystemsMay-27-2025, 13:49:02 GMT

Machine learning with missing data has been approached in many different ways, including feature imputation where missing feature values are estimated based on observed values and label prediction where downstream labels are learned directly from incomplete data. However, existing imputation models tend to have strong prior assumptions and cannot learn from downstream tasks, while models targeting label predictions often involve heuristics and can encounter scalability issues. Here we propose GRAPE, a framework for feature imputation as well as label prediction. GRAPE tackles the missing data problem using graph representation, where the observations and features are viewed as two types of nodes in a bipartite graph, and the observed feature values as edges. Under the GRAPE framework, the feature imputation is formulated as an edge-level prediction task and the label prediction as a node-level prediction task.

artificial intelligence, data quality, machine learning, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Quality (0.90)
Information Technology > Artificial Intelligence > Machine Learning (0.62)

Add feedback

Review for NeurIPS paper: Handling Missing Data with Graph Representation Learning

Neural Information Processing SystemsFeb-7-2025, 03:19:13 GMT

Dear authors, The reviewers discussed your document and carefully considered your rebuttal. All agree that the main contribution is the framework for dealing with missing values using bipartite graphs. This is an interesting idea, both for imputing missing values and for making predictions with missing values. They also appreciated that you added experimental comparisons to two reference methods (missMDA and MIWAE) and included the results in your response, as well as experiments on two additional high-dimensional data sets. Nevertheless, although they emphasized that GNNs are used here as a toolbox and not as the focus of the study, you need to be specific about important aspects of their application (such as discussions of architectural novelty and scalability), as noted by two reviewers.

graph representation learning, handling missing data, neurips paper, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.38)

Add feedback

Handling Missing Data with Graph Representation Learning

Neural Information Processing SystemsOct-11-2024, 13:33:24 GMT

Machine learning with missing data has been approached in many different ways, including feature imputation where missing feature values are estimated based on observed values and label prediction where downstream labels are learned directly from incomplete data. However, existing imputation models tend to have strong prior assumptions and cannot learn from downstream tasks, while models targeting label predictions often involve heuristics and can encounter scalability issues. Here we propose GRAPE, a framework for feature imputation as well as label prediction. GRAPE tackles the missing data problem using graph representation, where the observations and features are viewed as two types of nodes in a bipartite graph, and the observed feature values as edges. Under the GRAPE framework, the feature imputation is formulated as an edge-level prediction task and the label prediction as a node-level prediction task.

graph representation learning, handling missing data, label prediction, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Quality (0.90)
Information Technology > Artificial Intelligence > Machine Learning (0.62)

Add feedback

Handling Missing Data with Variational Bayesian Learning of ICA

Neural Information Processing SystemsApr-6-2023, 16:23:22 GMT

Missing data is common in real-world datasets and is a problem for many estimation techniques. We have developed a variational Bayesian method to perform Independent Component Analysis (ICA) on high-dimensional data containing missing entries. Missing data are handled naturally in the Bayesian framework by integrating the generative density model. Mod- eling the distributions of the independent sources with mixture of Gaus- sians allows sources to be estimated with different kurtosis and skewness. The variational Bayesian method automatically determines the dimen- sionality of the data and yields an accurate density model for the ob- served data without overfitting problems.

density model, handling missing data, variational bayesian learning, (1 more...)

Neural Information Processing Systems

Industry: Government > Regional Government (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Handling Missing Data with SimpleImputer - Analytics Vidhya

#artificialintelligenceOct-28-2022, 18:57:47 GMT

This article was published as a part of the Data Science Blogathon. Missing data in machine learning is a type of data that contains "None" or "NaN" type of values. One should take care of the missing data while dealing with machine learning algorithms and training. Missing data can be filled using basic python programming, pandas library, and a sci-kit learn library named SimpleImputer. Handling missing values using the sci-kit learns library SimpleImputer is the easiest and most convenient method of all the other missing data handling methods.

dataset, library, simpleimputer, (13 more...)

#artificialintelligence

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Handling Missing Data in Decision Trees: A Probabilistic Approach

Khosravi, Pasha, Vergari, Antonio, Choi, YooJung, Liang, Yitao, Broeck, Guy Van den

arXiv.org Artificial IntelligenceJun-29-2020

However, most of these are heuristics in nature (Twala et al., 2008), tailored towards some specific tree induction algorithm Decision trees are a popular family of models (Chen & Guestrin, 2016; Prokhorenkova et al., 2018), due to their attractive properties such as interpretability or make strong distributional assumptions about the data, and ability to handle heterogeneous such as the feature distribution factorizing completely (e.g., data. Concurrently, missing data is a prevalent mean, median imputation (Rubin, 1976)) or according to the occurrence that hinders performance of machine tree structure (Quinlan, 1993). As many works have compared learning models. As such, handling missing data the most prominent ones in empirical studies (Batista in decision trees is a well studied problem. In & Monard, 2003; Saar-Tsechansky & Provost, 2007), there this paper, we tackle this problem by taking a is no clear winner and ultimately, the adoption of a particular probabilistic approach. At deployment time, we strategy in practice boils down to its availability in the use tractable density estimators to compute the ML libraries employed. "expected prediction" of our models. At learning time, we fine-tune parameters of already learned In this work, we tackle handling missing data in trees at trees by minimizing their "expected prediction both learning and deployment time from a principled probabilistic loss" w.r.t.

artificial intelligence, machine learning, prediction, (15 more...)

arXiv.org Artificial Intelligence

2006.16341

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Handling Missing Data For Advanced Machine Learning

#artificialintelligenceJun-28-2020, 13:25:29 GMT

Throughout this article, you will become good at spotting, understanding, and imputing missing data. We demonstrate various imputation techniques on a real-world logistic regression task using Python. Properly handling missing data has an improving effect on inferences and predictions. This is not to be ignored. The first part of this article presents the framework for understanding missing data.

data quality, machine learning, predictor, (18 more...)

#artificialintelligence

Industry: Health & Medicine (0.77)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.53)

Add feedback

Handling Missing Data with Variational Bayesian Learning of ICA

Chan, Kwokleung, Lee, Te-Won, Sejnowski, Terrence J.

Neural Information Processing SystemsDec-31-2003

Missing data is common in real-world datasets and is a problem for many estimation techniques. We have developed a variational Bayesian method to perform Independent Component Analysis (ICA) on high-dimensional data containing missing entries. Missing data are handled naturally in the Bayesian framework by integrating the generative density model. Modeling the distributions of the independent sources with mixture of Gaussians allows sources to be estimated with different kurtosis and skewness. The variational Bayesian method automatically determines the dimensionality of the data and yields an accurate density model for the observed data without overfitting problems. This allows direct probability estimation of missing values in the high dimensional space and avoids dimension reduction preprocessing which is not feasible with missing data.

eqn, gaussian, ica, (10 more...)

Neural Information Processing Systems

Country: